DROPS

Document

DOI: 10.4230/OASIcs.SLATE.2023.5

Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations

Authors: David Rodrigues, António L. Lopes, and Fernando Batista

Published in: OASIcs, Volume 113, 12th Symposium on Languages, Applications and Technologies (SLATE 2023)

Abstract

The number of citations a research paper receives is a crucial metric for both researchers and institutions. However, since citation databases have their own source lists, finding all the citations of a given paper can be a challenge. As a result, there may be missing citations that are not counted towards a paper’s total citation count. To address this issue, we present an automated approach to find missing citations leveraging the use of multiple indexing databases. In this research, Web of Science (WoS) serves as a case study and OpenAlex is used as a reference point for comparison. For a given paper, we identify all citing papers found in both research databases. Then, for each citing paper we check if it is indexed in WoS, but not referred in WoS as a citing paper, in order to determine if it is a missing citation. In our experiments, from a set of 1539 papers indexed by WoS, we found 696 missing citations. This outcome proves the success of our approach, and reveals that WoS does not always consider the full list of citing papers of a given publication, even when these citing papers are indexed by WoS. We also found that WoS has a higher chance of missing information for more recent publications. These findings provide relevant insights about this indexing research database, and provide enough motivation for considering other research databases in our study, such as Scopus and Google Scholar, in order to improve the matching and querying algorithms, and to reduce false positives, towards providing a more comprehensive and accurate view of the citations of a paper.

Cite as

David Rodrigues, António L. Lopes, and Fernando Batista. Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations. In 12th Symposium on Languages, Applications and Technologies (SLATE 2023). Open Access Series in Informatics (OASIcs), Volume 113, pp. 5:1-5:11, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2023)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2023.5,
  author =	{Rodrigues, David and Lopes, Ant\'{o}nio L. and Batista, Fernando},
  title =	{{Web of Science Citation Gaps: An Automatic Approach to Detect Indexed but Missing Citations}},
  booktitle =	{12th Symposium on Languages, Applications and Technologies (SLATE 2023)},
  pages =	{5:1--5:11},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-291-4},
  ISSN =	{2190-6807},
  year =	{2023},
  volume =	{113},
  editor =	{Sim\~{o}es, Alberto and Ber\'{o}n, Mario Marcelo and Portela, Filipe},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2023.5},
  URN =		{urn:nbn:de:0030-drops-185199},
  doi =		{10.4230/OASIcs.SLATE.2023.5},
  annote =	{Keywords: Research Databases, Citations, Citation Databases, Web of Science, OpenAlex}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.10

Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments

Authors: Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

Online Hate Speech (OHS) has been growing dramatically on social media, which has motivated researchers to develop a diversity of methods for its automated detection. However, the detection of OHS in Portuguese is still little studied. To fill this gap, we explored different models that proved to be successful in the literature to address this task. In particular, we have explored transfer learning approaches, based on existing BERT-like pre-trained models. The performed experiments were based on CO-HATE, a corpus of YouTube comments posted by the Portuguese online community that was manually labeled by different annotators. Among other categories, those comments were labeled regarding the presence of hate speech and the type of hate speech, specifically overt and covert hate speech. We have assessed the impact of using annotations from different annotators on the performance of such models. In addition, we have analyzed the impact of distinguishing overt and and covert hate speech. The results achieved show the importance of considering the annotator’s profile in the development of hate speech detection models. Regarding the hate speech type, the results obtained do not allow to make any conclusion on what type is easier to detect. Finally, we show that pre-processing does not seem to have a significant impact on the performance of this specific task.

Cite as

Bernardo Cunha Matos, Raquel Bento Santos, Paula Carvalho, Ricardo Ribeiro, and Fernando Batista. Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 10:1-10:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{matos_et_al:OASIcs.SLATE.2022.10,
  author =	{Matos, Bernardo Cunha and Santos, Raquel Bento and Carvalho, Paula and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{10:1--10:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.10},
  URN =		{urn:nbn:de:0030-drops-167560},
  doi =		{10.4230/OASIcs.SLATE.2022.10},
  annote =	{Keywords: Hate Speech, Text Classification, Transfer Learning, Supervised Learning, Deep Learning}
}

@InProceedings{matos_et_al:OASIcs.SLATE.2022.10,
  author =	{Matos, Bernardo Cunha and Santos, Raquel Bento and Carvalho, Paula and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Comparing Different Approaches for Detecting Hate Speech in Online Portuguese Comments}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{10:1--10:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.10},
  URN =		{urn:nbn:de:0030-drops-167560},
  doi =		{10.4230/OASIcs.SLATE.2022.10},
  annote =	{Keywords: Hate Speech, Text Classification, Transfer Learning, Supervised Learning, Deep Learning}
}

Document

DOI: 10.4230/OASIcs.SLATE.2022.11

Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains

Authors: Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro

Published in: OASIcs, Volume 104, 11th Symposium on Languages, Applications and Technologies (SLATE 2022)

Abstract

With the increasing spread of hate speech (HS) on social media, it becomes urgent to develop models that can help detecting it automatically. Typically, such models require large-scale annotated corpora, which are still scarce in languages such as Portuguese. However, creating manually annotated corpora is a very expensive and time-consuming task. To address this problem, we propose an ensemble of two semi-supervised models that can be used to automatically create a corpus representative of online hate speech in Portuguese. The first model combines Generative Adversarial Networks and a BERT-based model. The second model is based on label propagation, and consists of propagating labels from existing annotated corpora to the unlabeled data, by exploring the notion of similarity. We have explored the annotations of three existing corpora (CO-HATE, ToLR-BR, and HPHS) in order to automatically annotate FIGHT, a corpus composed of geolocated tweets produced in the Portuguese territory. Through the process of selecting the best model and the corresponding setup, we have tested different pre-trained embeddings, performed experiments using different training subsets, labeled by different annotators with different perspectives, and performed several experiments with active learning. Furthermore, this work explores back translation as a mean to automatically generate additional hate speech samples. The best results were achieved by combining all the labeled datasets, obtaining 0.664 F1-score for the Hate Speech class in FIGHT.

Cite as

Raquel Bento Santos, Bernardo Cunha Matos, Paula Carvalho, Fernando Batista, and Ricardo Ribeiro. Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains. In 11th Symposium on Languages, Applications and Technologies (SLATE 2022). Open Access Series in Informatics (OASIcs), Volume 104, pp. 11:1-11:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2022)

Copy BibTex To Clipboard

@InProceedings{santos_et_al:OASIcs.SLATE.2022.11,
  author =	{Santos, Raquel Bento and Matos, Bernardo Cunha and Carvalho, Paula and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.11},
  URN =		{urn:nbn:de:0030-drops-167570},
  doi =		{10.4230/OASIcs.SLATE.2022.11},
  annote =	{Keywords: Hate Speech, Semi-Supervised Learning, Semi-Automatic Annotation}
}

@InProceedings{santos_et_al:OASIcs.SLATE.2022.11,
  author =	{Santos, Raquel Bento and Matos, Bernardo Cunha and Carvalho, Paula and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Semi-Supervised Annotation of Portuguese Hate Speech Across Social Media Domains}},
  booktitle =	{11th Symposium on Languages, Applications and Technologies (SLATE 2022)},
  pages =	{11:1--11:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-245-7},
  ISSN =	{2190-6807},
  year =	{2022},
  volume =	{104},
  editor =	{Cordeiro, Jo\~{a}o and Pereira, Maria Jo\~{a}o and Rodrigues, Nuno F. and Pais, Sebasti\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2022.11},
  URN =		{urn:nbn:de:0030-drops-167570},
  doi =		{10.4230/OASIcs.SLATE.2022.11},
  annote =	{Keywords: Hate Speech, Semi-Supervised Learning, Semi-Automatic Annotation}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.12

Semantic Search of Mobile Applications Using Word Embeddings

Authors: João Coelho, António Neto, Miguel Tavares, Carlos Coutinho, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

This paper proposes a set of approaches for the semantic search of mobile applications, based on their name and on the unstructured textual information contained in their description. The proposed approaches make use of word-level, character-level, and contextual word-embeddings that have been trained or fine-tuned using a dataset of about 500 thousand mobile apps, collected in the scope of this work. The proposed approaches have been evaluated using a public dataset that includes information about 43 thousand applications, and 56 manually annotated non-exact queries. Our results show that both character-level embeddings trained on our data, and fine-tuned RoBERTa models surpass the performance of the other existing retrieval strategies reported in the literature.

Cite as

João Coelho, António Neto, Miguel Tavares, Carlos Coutinho, Ricardo Ribeiro, and Fernando Batista. Semantic Search of Mobile Applications Using Word Embeddings. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 12:1-12:12, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{coelho_et_al:OASIcs.SLATE.2021.12,
  author =	{Coelho, Jo\~{a}o and Neto, Ant\'{o}nio and Tavares, Miguel and Coutinho, Carlos and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Semantic Search of Mobile Applications Using Word Embeddings}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{12:1--12:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.12},
  URN =		{urn:nbn:de:0030-drops-144292},
  doi =		{10.4230/OASIcs.SLATE.2021.12},
  annote =	{Keywords: Semantic Search, Word Embeddings, Elasticsearch, Mobile Applications}
}

@InProceedings{coelho_et_al:OASIcs.SLATE.2021.12,
  author =	{Coelho, Jo\~{a}o and Neto, Ant\'{o}nio and Tavares, Miguel and Coutinho, Carlos and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Semantic Search of Mobile Applications Using Word Embeddings}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{12:1--12:12},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.12},
  URN =		{urn:nbn:de:0030-drops-144292},
  doi =		{10.4230/OASIcs.SLATE.2021.12},
  annote =	{Keywords: Semantic Search, Word Embeddings, Elasticsearch, Mobile Applications}
}

Document

DOI: 10.4230/OASIcs.SLATE.2021.17

Sentiment Analysis of Portuguese Economic News

Authors: Cátia Tavares, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 94, 10th Symposium on Languages, Applications and Technologies (SLATE 2021)

Abstract

This paper proposes a rule-based method for automatic polarity detection over economic news texts, which proved suitable for detecting the sentiment in Portuguese economic news. The data used in our experiments consists of 400 manually annotated sentences extracted from economic news, used for evaluation, and about 90 thousand Portuguese economic news, extracted from two well-known Portuguese newspapers, covering the period from 2010 to 2020, that have been used for training our systems. In order to perform sentiment analysis of economic news, we have also tested the adaptation of existing pre-trained modules, and also performed experiments with a set of Machine Learning approaches, and self-training. Experimental results show that our rule-based approach, that uses manually written rules related to the economic context, achieves the best results for automatically detecting the polarity of economic news, largely surpassing the other approaches.

Cite as

Cátia Tavares, Ricardo Ribeiro, and Fernando Batista. Sentiment Analysis of Portuguese Economic News. In 10th Symposium on Languages, Applications and Technologies (SLATE 2021). Open Access Series in Informatics (OASIcs), Volume 94, pp. 17:1-17:13, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)

Copy BibTex To Clipboard

@InProceedings{tavares_et_al:OASIcs.SLATE.2021.17,
  author =	{Tavares, C\'{a}tia and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Sentiment Analysis of Portuguese Economic News}},
  booktitle =	{10th Symposium on Languages, Applications and Technologies (SLATE 2021)},
  pages =	{17:1--17:13},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-202-0},
  ISSN =	{2190-6807},
  year =	{2021},
  volume =	{94},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Portela, Filipe and Pereira, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2021.17},
  URN =		{urn:nbn:de:0030-drops-144347},
  doi =		{10.4230/OASIcs.SLATE.2021.17},
  annote =	{Keywords: Sentiment Analysis, Economic News, Portuguese Language}
}

Document

DOI: 10.4230/OASIcs.SLATE.2020.3

Detection of Emerging Words in Portuguese Tweets

Authors: Afonso Pinto, Helena Moniz, and Fernando Batista

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

This paper tackles the problem of detecting emerging words on a language, based on social networks content. It proposes an approach for detecting new words on Twitter, and reports the achieved results for a collection of 8 million Portuguese tweets. This study uses geolocated tweets, collected between January 2018 and June 2019, and written in the Portuguese territory. The first six months of the data were used to define an initial vocabulary on known words, and the following 12 months were used for identifying new words, thus testing our approach. The set of resulting words were manually analyzed, revealing a number of distinct events, and suggesting that Twitter may be a valuable resource for researching neology, and the dynamics of a language.

Cite as

Afonso Pinto, Helena Moniz, and Fernando Batista. Detection of Emerging Words in Portuguese Tweets. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 3:1-3:10, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{pinto_et_al:OASIcs.SLATE.2020.3,
  author =	{Pinto, Afonso and Moniz, Helena and Batista, Fernando},
  title =	{{Detection of Emerging Words in Portuguese Tweets}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{3:1--3:10},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.3},
  URN =		{urn:nbn:de:0030-drops-130164},
  doi =		{10.4230/OASIcs.SLATE.2020.3},
  annote =	{Keywords: Emerging words, Twitter, Portuguese language}
}

Document

DOI: 10.4230/OASIcs.SLATE.2020.7

Towards the Identification of Fake News in Portuguese

Authors: João Rodrigues, Ricardo Ribeiro, and Fernando Batista

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

All over the world, many initiatives have been taken to fight fake news. Governments (e.g., France, Germany, United Kingdom and Spain), on their own way, started to take action regarding legal accountability for those who manufacture or propagate fake news. Different media outlets have also taken a multitude of initiatives to deal with this phenomenon, such as the increase of discipline, accuracy and transparency of publications made internally. Some structural changes have lately been made in said companies and entities in order to better evaluate news in general. As such, many teams were built entirely to fight fake news - the so-called "fact-checkers". These have been adopting different techniques in order to do so: from the typical use of journalists to find out the true behind a controversial statement, to data-scientists that apply forefront techniques such as text mining and machine learning to support the journalist’s decisions. Many of these entities, which aim to maintain or improve their reputation, started to focus on high standards for quality and reliable information, which led to the creation of official and dedicated departments for fact-checking. In this revision paper, not only will we highlight relevant contributions and efforts across the fake news identification and classification status quo, but we will also contextualize the Portuguese language state of affairs in the current state-of-the-art.

Cite as

João Rodrigues, Ricardo Ribeiro, and Fernando Batista. Towards the Identification of Fake News in Portuguese. In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 7:1-7:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2020.7,
  author =	{Rodrigues, Jo\~{a}o and Ribeiro, Ricardo and Batista, Fernando},
  title =	{{Towards the Identification of Fake News in Portuguese}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{7:1--7:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.7},
  URN =		{urn:nbn:de:0030-drops-130207},
  doi =		{10.4230/OASIcs.SLATE.2020.7},
  annote =	{Keywords: Fake News, Portuguese Language, Fact-checking}
}

Document

Short Paper

DOI: 10.4230/OASIcs.SLATE.2020.12

Different Lexicon-Based Approaches to Emotion Identification in Portuguese Tweets (Short Paper)

Authors: Soraia Filipe, Fernando Batista, and Ricardo Ribeiro

Published in: OASIcs, Volume 83, 9th Symposium on Languages, Applications and Technologies (SLATE 2020)

Abstract

This paper presents the existing literature on the identification of emotions and describes various lexica-based approaches and translation strategies to identify emotions in Portuguese tweets. A dataset of tweets was manually annotated to evaluate our classifier and also to assess the difficulty of the task. A lexicon-based approach was used in order to classify the presence or absence of eight different emotions in a tweet. Different strategies have been applied to refine and improve an existing and widely used lexicon, by means of automatic machine translation and aligned word embeddings. We tested six different classification approaches, exploring different ways of directly applying resources available for English by means of different translation strategies. The achieved results suggest that a better performance can be obtained both by improving a lexicon and by directly translating tweets into English and then applying an existing English lexicon.

Cite as

Soraia Filipe, Fernando Batista, and Ricardo Ribeiro. Different Lexicon-Based Approaches to Emotion Identification in Portuguese Tweets (Short Paper). In 9th Symposium on Languages, Applications and Technologies (SLATE 2020). Open Access Series in Informatics (OASIcs), Volume 83, pp. 12:1-12:8, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2020)

Copy BibTex To Clipboard

@InProceedings{filipe_et_al:OASIcs.SLATE.2020.12,
  author =	{Filipe, Soraia and Batista, Fernando and Ribeiro, Ricardo},
  title =	{{Different Lexicon-Based Approaches to Emotion Identification in Portuguese Tweets}},
  booktitle =	{9th Symposium on Languages, Applications and Technologies (SLATE 2020)},
  pages =	{12:1--12:8},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-165-8},
  ISSN =	{2190-6807},
  year =	{2020},
  volume =	{83},
  editor =	{Sim\~{o}es, Alberto and Henriques, Pedro Rangel and Queir\'{o}s, Ricardo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2020.12},
  URN =		{urn:nbn:de:0030-drops-130252},
  doi =		{10.4230/OASIcs.SLATE.2020.12},
  annote =	{Keywords: Emotion detection, tweets, Portuguese Language, Emotion lexicon}
}

Document

Complete Volume

DOI: 10.4230/OASIcs.SLATE.2019

OASIcs, Volume 74, SLATE'19, Complete Volume

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

OASIcs, Volume 74, SLATE'19, Complete Volume

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@Proceedings{rodrigues_et_al:OASIcs.SLATE.2019,
  title =	{{OASIcs, Volume 74, SLATE'19, Complete Volume}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019},
  URN =		{urn:nbn:de:0030-drops-109008},
  doi =		{10.4230/OASIcs.SLATE.2019},
  annote =	{Keywords: Computing methodologies, Natural language processing, Software and its engineering, Compilers; Information systems, World Wide Web}
}

Document

Front Matter

DOI: 10.4230/OASIcs.SLATE.2019.0

Front Matter, Table of Contents, Preface, Conference Organization

Authors: Ricardo Rodrigues, Jan Janoušek, Luís Ferreira, Luísa Coheur, Fernando Batista, and Hugo Gonçalo Oliveira

Published in: OASIcs, Volume 74, 8th Symposium on Languages, Applications and Technologies (SLATE 2019)

Abstract

Front Matter, Table of Contents, Preface, Conference Organization

Cite as

8th Symposium on Languages, Applications and Technologies (SLATE 2019). Open Access Series in Informatics (OASIcs), Volume 74, pp. 0:i-0:xviii, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2019)

Copy BibTex To Clipboard

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

@InProceedings{rodrigues_et_al:OASIcs.SLATE.2019.0,
  author =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  title =	{{Front Matter, Table of Contents, Preface, Conference Organization}},
  booktitle =	{8th Symposium on Languages, Applications and Technologies (SLATE 2019)},
  pages =	{0:i--0:xviii},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-114-6},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{74},
  editor =	{Rodrigues, Ricardo and Janou\v{s}ek, Jan and Ferreira, Lu{\'\i}s and Coheur, Lu{\'\i}sa and Batista, Fernando and Gon\c{c}alo Oliveira, Hugo},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2019.0},
  URN =		{urn:nbn:de:0030-drops-108679},
  doi =		{10.4230/OASIcs.SLATE.2019.0},
  annote =	{Keywords: Front Matter, Table of Contents, Preface, Conference Organization}
}

Document

DOI: 10.4230/OASIcs.SLATE.2017.21

Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments

Authors: Rui Pascoal, Ricardo Ribeiro, Fernando Batista, and Ana de Almeida

Published in: OASIcs, Volume 56, 6th Symposium on Languages, Applications and Technologies (SLATE 2017)

Abstract

This paper describes the process of integrating automatic speech recognition (ASR) into a mobile application and explores the benefits and challenges of integrating speech with augmented reality (AR) in outdoor environments. The augmented reality allows end-users to interact with the information displayed and perform tasks, while increasing the user’s perception about the real world by adding virtual information to it. Speech is the most natural way of communication: it allows hands-free interaction and may allow end-users to quickly and easily access a range of features available. Speech recognition technology is often available in most of the current mobile devices, but it often uses Internet to receive the corresponding transcript from remote servers, e.g., Google speech recognition. However, in some outdoor environments, Internet is not always available or may be offered at poor quality. We integrated an off-line automatic speech recognition module into an AR application for outdoor usage that does not require Internet. Currently, speech interaction is used within the application to access five different features, namely: to take a photo, shoot a film, communicate, messaging related tasks, and to request information, either geographic, biometric, or climatic. The application makes available solutions to manage and interact with the mobile device, offering good usability. We have compared the online and off-line speech recognition systems in order to assess their adequacy to the tasks. Both systems were tested under different conditions, commonly found in outdoor environments, such as: Internet access quality, presence of noise, and distractions.

Cite as

Rui Pascoal, Ricardo Ribeiro, Fernando Batista, and Ana de Almeida. Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments. In 6th Symposium on Languages, Applications and Technologies (SLATE 2017). Open Access Series in Informatics (OASIcs), Volume 56, pp. 21:1-21:14, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)

Copy BibTex To Clipboard

@InProceedings{pascoal_et_al:OASIcs.SLATE.2017.21,
  author =	{Pascoal, Rui and Ribeiro, Ricardo and Batista, Fernando and de Almeida, Ana},
  title =	{{Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{21:1--21:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.21},
  URN =		{urn:nbn:de:0030-drops-79541},
  doi =		{10.4230/OASIcs.SLATE.2017.21},
  annote =	{Keywords: Speech Recognition, Natural Language Processing, Sphinx for Mobile Devices, Augmented Reality, Outdoor Environments}
}

@InProceedings{pascoal_et_al:OASIcs.SLATE.2017.21,
  author =	{Pascoal, Rui and Ribeiro, Ricardo and Batista, Fernando and de Almeida, Ana},
  title =	{{Adapting Speech Recognition in Augmented Reality for Mobile Devices in Outdoor Environments}},
  booktitle =	{6th Symposium on Languages, Applications and Technologies (SLATE 2017)},
  pages =	{21:1--21:14},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-056-9},
  ISSN =	{2190-6807},
  year =	{2017},
  volume =	{56},
  editor =	{Queir\'{o}s, Ricardo and Pinto, M\'{a}rio and Sim\~{o}es, Alberto and Leal, Jos\'{e} Paulo and Varanda, Maria Jo\~{a}o},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2017.21},
  URN =		{urn:nbn:de:0030-drops-79541},
  doi =		{10.4230/OASIcs.SLATE.2017.21},
  annote =	{Keywords: Speech Recognition, Natural Language Processing, Sphinx for Mobile Devices, Augmented Reality, Outdoor Environments}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.185

Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends

Authors: Hugo Rosa, João Paulo Carvalho, and Fernando Batista

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

In this paper we propose to approach the subject of Twitter Topic Detection when in the presence of a large number of trending topics. We use a new technique, called Twitter Topic Fuzzy Fingerprints, and compare it with two popular text classification techniques, Support Vector Machines (SVM) and k-Nearest Neighbours (kNN). Preliminary results show that it outperforms the other two techniques, while still being much faster, which is an essential feature when processing large volumes of streaming data. We focused on a data set of Portuguese language tweets and the respective top trends as indicated by Twitter.

Cite as

Hugo Rosa, João Paulo Carvalho, and Fernando Batista. Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 185-199, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{rosa_et_al:OASIcs.SLATE.2014.185,
  author =	{Rosa, Hugo and Carvalho, Jo\~{a}o Paulo and Batista, Fernando},
  title =	{{Detecting a Tweet’s Topic within a Large Number of Portuguese Twitter Trends}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{185--199},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.185},
  URN =		{urn:nbn:de:0030-drops-45696},
  doi =		{10.4230/OASIcs.SLATE.2014.185},
  annote =	{Keywords: topic detection, social networks data mining, Twitter, Portuguese language}
}

Document

DOI: 10.4230/OASIcs.SLATE.2014.275

Expanding a Database of Portuguese Tweets

Authors: Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz

Published in: OASIcs, Volume 38, 3rd Symposium on Languages, Applications and Technologies (2014)

Abstract

This paper describes an existing database of geolocated tweets that were produced in Portuguese regions and proposes an approach to further expand it. The existing database covers eight consecutive days of collected tweets, totaling about 300 thousand tweets, produced by about 11 thousand different users. A detailed analysis on the content of the messages suggests a predominance of young authors that use Twitter as a way of reaching their colleagues with their feelings, ideas and comments. In order to further characterize this community of young people, we propose a method for retrieving additional tweets produced by the same set of authors already in the database. Our goal is to further extend the knowledge about each user of this community, making it possible to automatically characterize each user by the content he/she produces, cluster users and open other possibilities in the scope of social analysis.

Cite as

Gaspar Brogueira, Fernando Batista, João Paulo Carvalho, and Helena Moniz. Expanding a Database of Portuguese Tweets. In 3rd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 38, pp. 275-282, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2014)

Copy BibTex To Clipboard

@InProceedings{brogueira_et_al:OASIcs.SLATE.2014.275,
  author =	{Brogueira, Gaspar and Batista, Fernando and Carvalho, Jo\~{a}o Paulo and Moniz, Helena},
  title =	{{Expanding a Database of Portuguese Tweets}},
  booktitle =	{3rd Symposium on Languages, Applications and Technologies},
  pages =	{275--282},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-68-2},
  ISSN =	{2190-6807},
  year =	{2014},
  volume =	{38},
  editor =	{Pereira, Maria Jo\~{a}o Varanda and Leal, Jos\'{e} Paulo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2014.275},
  URN =		{urn:nbn:de:0030-drops-45763},
  doi =		{10.4230/OASIcs.SLATE.2014.275},
  annote =	{Keywords: Twitter, corpus of Portuguese tweets, Twitter API, natural language processing, text analysis}
}

Document

DOI: 10.4230/OASIcs.SLATE.2013.259

Comparing Different Methods for Disfluency Structure Detection

Authors: Henrique Medeiros, Fernando Batista, Helena Moniz, Isabel Trancoso, and Luis Nunes

Published in: OASIcs, Volume 29, 2nd Symposium on Languages, Applications and Technologies (2013)

Abstract

This paper presents a number of experiments focusing on assessing the performance of different machine learning methods on the identification of disfluencies and their distinct structural regions over speech data. Several machine learning methods have been applied, namely Naive Bayes, Logistic Regression, Classification and Regression Trees (CARTs), J48 and Multilayer Perceptron. Our experiments show that CARTs outperform the other methods on the identification of the distinct structural disfluent regions. Reported experiments are based on audio segmentation and prosodic features, calculated from a corpus of university lectures in European Portuguese, containing about 32h of speech and about 7.7% of disfluencies. The set of features automatically extracted from the forced alignment corpus proved to be discriminant of the regions contained in the production of a disfluency. This work shows that using fully automatic prosodic features, disfluency structural regions can be reliably identified using CARTs, where the best results achieved correspond to 81.5% precision, 27.6% recall, and 41.2% F-measure. The best results concern the detection of the interregnum, followed by the detection of the interruption point.

Cite as

Henrique Medeiros, Fernando Batista, Helena Moniz, Isabel Trancoso, and Luis Nunes. Comparing Different Methods for Disfluency Structure Detection. In 2nd Symposium on Languages, Applications and Technologies. Open Access Series in Informatics (OASIcs), Volume 29, pp. 259-269, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2013)

Copy BibTex To Clipboard

@InProceedings{medeiros_et_al:OASIcs.SLATE.2013.259,
  author =	{Medeiros, Henrique and Batista, Fernando and Moniz, Helena and Trancoso, Isabel and Nunes, Luis},
  title =	{{Comparing Different Methods for Disfluency Structure Detection}},
  booktitle =	{2nd Symposium on Languages, Applications and Technologies},
  pages =	{259--269},
  series =	{Open Access Series in Informatics (OASIcs)},
  ISBN =	{978-3-939897-52-1},
  ISSN =	{2190-6807},
  year =	{2013},
  volume =	{29},
  editor =	{Leal, Jos\'{e} Paulo and Rocha, Ricardo and Sim\~{o}es, Alberto},
  publisher =	{Schloss Dagstuhl -- Leibniz-Zentrum f{\"u}r Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{https://drops-dev.dagstuhl.de/entities/document/10.4230/OASIcs.SLATE.2013.259},
  URN =		{urn:nbn:de:0030-drops-40420},
  doi =		{10.4230/OASIcs.SLATE.2013.259},
  annote =	{Keywords: Machine learning, speech processing, prosodic features, automatic detection of disfluencies}
}

Search Results

Documents authored by Batista, Fernando

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Abstract

Cite as

Thanks for your feedback!

Could not send message